摘要象征性推理,如数学和逻辑等领域所需的,是人类智力的关键组成部分。这些域的求解器具有重要的应用,尤其是计算机辅助教育。但是学习解决符号问题是机器学习算法的具有挑战性。现有模型既可以从人体解决方案中学习或使用手工工程功能,使其在新域中应用昂贵。在本文中,我们将符号域视为一种简单的环境,其中指示了状态和操作作为非结构化文本,而二进制奖励指示是否解决了问题。这种灵活的设置可以轻松指定新域,但搜索和计划变得具有挑战性。我们介绍了由数学普通核心课程启发的四种环境,并观察到现有的加强学习基线表现不佳。然后,我们提出了一种新的学习算法,对比策略学习(Conpole),明确地优化了Inconce损失,下面限制了当前状态和下一个状态之间的相互信息,继续在解决方案的路径上。 Conpole成功解决了所有四个域。此外,Conpole学习的问题表示可以准确地预测真实数学课程中的问题类别。我们的成果建议在象征域中的加强学习的新方向,以及数学教育的应用。
translated by 谷歌翻译
对象检测是典型的多任务学习应用程序,其同时优化分类和回归。但是,分类损失总是以基于锚的方法的多任务损失主导,妨碍了任务的一致和平衡优化。在本文中,我们发现转移边界盒可以在分类中改变正面和负样本的划分,意思是分类取决于回归。此外,考虑到不同的数据集,优化器和回归损耗功能,我们总结了关于微调损耗重量的三个重要结论。基于上述结论,我们提出了自适应损失重量调整(ALWA)以根据损失的统计特征来解决优化基于锚的方法的不平衡。通过将Alwa纳入以前的最先进的探测器,我们在Pascal VOC和MS Coco上实现了显着的性能增益,即使是L1,Smoothl1和Ciou丢失。代码可在https://github.com/ywx-hub/alwa获得。
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
Advances in computer vision and machine learning techniques have led to significant development in 2D and 3D human pose estimation from RGB cameras, LiDAR, and radars. However, human pose estimation from images is adversely affected by occlusion and lighting, which are common in many scenarios of interest. Radar and LiDAR technologies, on the other hand, need specialized hardware that is expensive and power-intensive. Furthermore, placing these sensors in non-public areas raises significant privacy concerns. To address these limitations, recent research has explored the use of WiFi antennas (1D sensors) for body segmentation and key-point body detection. This paper further expands on the use of the WiFi signal in combination with deep learning architectures, commonly used in computer vision, to estimate dense human pose correspondence. We developed a deep neural network that maps the phase and amplitude of WiFi signals to UV coordinates within 24 human regions. The results of the study reveal that our model can estimate the dense pose of multiple subjects, with comparable performance to image-based approaches, by utilizing WiFi signals as the only input. This paves the way for low-cost, broadly accessible, and privacy-preserving algorithms for human sensing.
translated by 谷歌翻译
With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few training examples. It has been a new trend exploring ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress, challenges, and future work in ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques of ICL, including training strategies, prompting strategies, and so on. Finally, we present the challenges of ICL and provide potential directions for further research. We hope our work can encourage more research on uncovering how ICL works and improving ICL in future work.
translated by 谷歌翻译
Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work focuses on the former. Previous methods build the network with several modules like CNN, LSTM and Attention. Recent methods combine the Transformer with these modules for better performance. However, it requires tedious optimization skills to train a network composed of mixed modules, making these methods inconvenient to be used in practice. In this paper, we propose to design \emph{pure Transformer-based networks} for deep RL, aiming at providing off-the-shelf backbones for both the online and offline settings. Specifically, the Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making. Experiments show that TIT can achieve satisfactory performance in different settings, consistently.
translated by 谷歌翻译
Recently the deep learning has shown its advantage in representation learning and clustering for time series data. Despite the considerable progress, the existing deep time series clustering approaches mostly seek to train the deep neural network by some instance reconstruction based or cluster distribution based objective, which, however, lack the ability to exploit the sample-wise (or augmentation-wise) contrastive information or even the higher-level (e.g., cluster-level) contrastiveness for learning discriminative and clustering-friendly representations. In light of this, this paper presents a deep temporal contrastive clustering (DTCC) approach, which for the first time, to our knowledge, incorporates the contrastive learning paradigm into the deep time series clustering research. Specifically, with two parallel views generated from the original time series and their augmentations, we utilize two identical auto-encoders to learn the corresponding representations, and in the meantime perform the cluster distribution learning by incorporating a k-means objective. Further, two levels of contrastive learning are simultaneously enforced to capture the instance-level and cluster-level contrastive information, respectively. With the reconstruction loss of the auto-encoder, the cluster distribution loss, and the two levels of contrastive losses jointly optimized, the network architecture is trained in a self-supervised manner and the clustering result can thereby be obtained. Experiments on a variety of time series datasets demonstrate the superiority of our DTCC approach over the state-of-the-art.
translated by 谷歌翻译
Active tracking of space noncooperative object that merely relies on vision camera is greatly significant for autonomous rendezvous and debris removal. Considering its Partial Observable Markov Decision Process (POMDP) property, this paper proposes a novel tracker based on deep recurrent reinforcement learning, named as RAMAVT which drives the chasing spacecraft to follow arbitrary space noncooperative object with high-frequency and near-optimal velocity control commands. To further improve the active tracking performance, we introduce Multi-Head Attention (MHA) module and Squeeze-and-Excitation (SE) layer into RAMAVT, which remarkably improve the representative ability of neural network with almost no extra computational cost. Extensive experiments and ablation study implemented on SNCOAT benchmark show the effectiveness and robustness of our method compared with other state-of-the-art algorithm. The source codes are available on https://github.com/Dongzhou-1996/RAMAVT.
translated by 谷歌翻译
Forecasts by the European Centre for Medium-Range Weather Forecasts (ECMWF; EC for short) can provide a basis for the establishment of maritime-disaster warning systems, but they contain some systematic biases.The fifth-generation EC atmospheric reanalysis (ERA5) data have high accuracy, but are delayed by about 5 days. To overcome this issue, a spatiotemporal deep-learning method could be used for nonlinear mapping between EC and ERA5 data, which would improve the quality of EC wind forecast data in real time. In this study, we developed the Multi-Task-Double Encoder Trajectory Gated Recurrent Unit (MT-DETrajGRU) model, which uses an improved double-encoder forecaster architecture to model the spatiotemporal sequence of the U and V components of the wind field; we designed a multi-task learning loss function to correct wind speed and wind direction simultaneously using only one model. The study area was the western North Pacific (WNP), and real-time rolling bias corrections were made for 10-day wind-field forecasts released by the EC between December 2020 and November 2021, divided into four seasons. Compared with the original EC forecasts, after correction using the MT-DETrajGRU model the wind speed and wind direction biases in the four seasons were reduced by 8-11% and 9-14%, respectively. In addition, the proposed method modelled the data uniformly under different weather conditions. The correction performance under normal and typhoon conditions was comparable, indicating that the data-driven mode constructed here is robust and generalizable.
translated by 谷歌翻译